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Abstract 

System performance for networks composed of interconnected subsystems can 
be increased if the traditionally separated subsystems are jointly optimized. Re- 
cently, parallel and distributed optimization methods have emerged as a pow- 
erful tool for solving estimation and control problems in large-scale networked 
systems. In this paper we review and analyze the optimization-theoretic con- 
cepts of parallel and distributed methods for solving coupled optimization prob- 
lems and demonstrate how several estimation and control problems related to 
complex networked systems can be formulated in these settings. The paper 
presents a systematic framework for exploiting the potential of the decompo- 
sition structures as a way to obtain different parallel algorithms, each with a 
different tradeoff among convergence speed, message passing amount and dis- 
tributed computation architecture. Several specific applications from estimation 
and process control are included to demonstrate the power of the approach. 
Keywords: Estimation, cooperative and distributed control, networks of 
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1. Introduction 



In many application fields, the notion of networks has emerged as a central, 
unifying concept for solving different problems in systems and control theory 
such as analysis, process control and estimation. We live and operate in a net- 
worked world. We drive to work on networks of roads and communicate with 
each other using an elaborate set of devices such as phones or computers, that 
connect wirelessly and through the internet. Traditional networks include trans- 
portation networks (roads, rails) and networks of utilities (water, electricity, 
gas). But more recent examples of the increasing impact of networks include 
information technology networks (internet, mobile phones, acoustic networks, 
etc), information networks (co-author networks, bibliographic networks), social 
networks (collaborations, organizations), and biological and genetic networks. 

These networks are often composed of multiple subsystems characterized 
by complex dynamics and mutual interactions such that local decisions have 
long-range effects throughout the entire network. Many problems associated 
to networked systems, such as state estimation and control, can be posed as 
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coupled optimization problems (see e.g. [J, [10|,[ll|,ll7|, I21|,|25|,l28|, |40|, etc). Note 



that in these systems the interaction between subsystems gives rise to coupling 
in the cost or constraints, but with a specific algebraic structure, in particular 
sparse matrix representation that could be exploited in numerical algorithms. 
Therefore, in order to design an overall decision architecture for such complex 
networks we need to solve large coupled optimization problems but with spe- 
cific structure. The major difficulty in these problems is that due to their size, 
communication restrictions, or requirements on robustness, often no central de- 
cisions can be taken; instead, the decisions have to be taken locally. In such a 
set-up, single units, or local agents, must solve local optimization subproblems 
and then they must negotiate their outcomes and requirements with their neigh- 
bors in order to achieve convergence to the global optimal solution. Basically, 
there are two general optimization approaches: 

(i) "Centralized" optimization algorithms: In this class the specific structure of 
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the system is exploited, as it represents considerable sparsity in the optimization 
problem due to the local coupling between optimization variables (sometimes 
referred to as separable optimization problems). The sparsity of the problem, 
given by the influences between the subsystems, leads to coupling constraints 
represented by sparse matrices. Though parts of the algorithms will be paral- 
lelized, the parallelization in these algorithms is not restricted by e.g. limited 
communication between subsystems and is just for the sake of exploiting spar- 
sity. In summary, "centralized" algorithms benefit from the sparsity induced 
by the networked system and solve the resulting optimization problems on a 
parallel computer architecture. Several standard parallel and distributed op- 
timization methods can be found in the textbooks . Various survey 
)apers also exist on optimization-based distributed control. In the 70's Tamura 
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23[ presented very comprehensive overviews. More recently. 



in [42] the actual status of research in the field of coordinated optimization- 
based control is presented. Many different control topologies can be considered 
in distributed control, which have been reviewed recently in \4^. When there 
is no need to solve the separable optimization problem on a parallel computer 
architecture, an alternative would be to solve the global optimization problem 
using sparse solvers that take into account the sparse structure of the problem 
at the linear algebra level of the optimization algorithm. In general, this choice 
could lead to faster algorithms in terms of CPU time than distributed or parallel 
algorithms. 

(ii) Distributed optimization algorithms (sometimes referred to as distributed 
multi-agent optimization algorithms): In contrast to the "centralized" algo- 
rithms, distributed algorithms on graphs have to satisfy an extra constraint, 
namely their computations shall be performed on all nodes in parallel, and 
the communication between nodes is restricted to the edges of the graph, i.e. 
such algorithms do not use all-to-all communication protocols. In many com- 
plex networked systems the desired behavior can be formulated as coupled op- 
timization problems but with restrictions on communication due to the spe- 
cial network topology: e.g. estimation in sensor networks, consensus and ren- 
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dezvous problems in multi-agent systems, resource allocation in computer net- 



works 
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35|- Some existing distributed methods that take into account 



explicitly information restrictions in the network combine consensus negotia- 
tions (as an efficient method for information fusion) with subgradient methods 
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The goal of this paper is twofold: (i) to establish a relationship between esti- 
mation and control in networked systems and distributed optimization methods 
and demonstrate the effectiveness of utilizing optimization-theoretic approaches 
for controlling such complex systems; (ii) motivated by this connection, to build 
upon optimization based results to better accommodate a broader class of es- 
timation and control problems. The core of this paper consists of Section [51 
covering three applications of estimation and control that appear in the con- 
text of networked systems and then proving how we can reformulate them as 
coupled optimization problems. One of the key contributions of this paper is to 
provide an accessible, yet relatively comprehensive, overview of three classes of 
decomposition schemes from mathematical programming for solving distribu- 
tively coupled optimization problems. We demonstrate how the decomposition 
schemes suggest network architectures and protocols with different properties 
in terms of convergence speed and coordination overhead. We also present new 
decomposition methods that are more efficient in terms of convergence speed 
than some classical decomposition schemes. 

The paper is organized as follows. In Section [5] we introduce different es- 
timation and control problems that appear in the context of complex systems 
with interacting subsystems dynamics and then we show how we can reformu- 
late them as coupled optimization problems. In Section [3] we present several 
parallel and distributed methods for solving this type of structured optimiza- 
tion problems and analyze their performance. Section [3] thus serves both as a 
review of the necessary background and a summary of our new extensions on 
decomposition methods. For each of the applications, numerical experiments 
on different parallel and distributed algorithms are provided. 
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2. Estimation and control problems in networks 

In this section we formulate different estimation and control problems for 
systems consisting of interconnected subsystems. In Subsection 12. II we present 
a state estimation problem for a system, using a network of sensors which must 
exchange information in order to reach a consensus on the state estimated for the 
entire system. In Subsection 12 . 2 1 we will present the problem of optimal control 
for a large-scale system, whose subsystems are coupled with their neighbors but 
the objective function is decoupled. Finally, in Subsection 12.31 and 12.41 we will 
discuss the cooperative control problem for a group of systems (agents), which 
have decoupled or coupled dynamics but share a common goal. 

2.1. State estimation problem 

In this section we formulate the distributed state estimation problem for 
systems u sing a sensor network based on the moving horizon estimation (MHE) 



approach [lO , 11 , 1^ 40 , |41| . Sensor networks can be employed in many appli- 
cations, such as monitoring, exploration, surveillance or tracking targets over 
specific regions. We consider the concept of MHE, as this framework offers mul- 
tiple advantages: since a particular minimization problem must be solved on-line 
at each step, the observer is optimal with respect to the associated cost, and 
moreover, constraints on the state and on the noise can be taken into account 
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The state estimation problem can be posed as follows. We assume that 
each sensor in the network measures some variables of a process, computes a 
local estimate of the entire state of the system, and exchanges the computed 
estimates with its neighbors. The solution to the estimation problem consists in 
finding a methodology which guarantees that all sensors asymptotically reach a 
reliable estimate of the overall state of the system. For the observed process we 
consider the following nonlinear dynamics: 

xt+i = 4>ixt) + wt, 
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where Xt € X C R" is the state vector and Wt € W C represents a white 
noise with covariance equal to Q. We also assume that the sets X and W 
are convex. The initial condition xq is a random variable with mean xq and 
covariance Ho. Measurements on the state vector are performed by M sensorj^, 
according to the following sensing model: 

yl^e\xt) + vl, Vz = l,---,M 

where vl G represents white noise with covariance matrix Ri . The functions 
(j) and 6^ can be in general nonlinear. 

For a given estimation horizon > 1 , at time k given the past measurements 
Vk-N^ ' ' ■ ' provided by the ith sensor and the estimate Xk^N, we formulate 
the moving horizon estimation (MHE) at k as the solution to the following 
optimization problem 1^ I^Q , 41 1 : 



mm 



M k fe-1 

in ^ ^ ll^tilfl-i+H ||wt||^-i+ ||xfe_N -ife-w|ln-i (1) 

1=1 t=k-N t=k-N 

S.t. : Xt+i = (i){xt) + Wt, (Pl) 

Xt(^X,wt(^W Vi, (II12) 

where the matrix Tik-N is computed recursively from a Riccati difference equa- 
tion in a centralized way [4^. For the liner case, the distributed computation of 
this matrix can be done in many ways: e.g. using the steady-state MHE formu- 
lation (i.e. computing off-line IIoo, which is the solution of the corresponding 
algebraic Riccati equation) or updating li^k-N for all the sensors in the same way 
(using a common covariance matrix R for all sensors in the Riccati difference 
equation update). For the nonlinear case, the update of Ilfe-Tv in a distributed 
fashion is still an open issue. 

Note that = — 9'^{xt) and using the dynamics we can write 

St=fe-Af ll'^tllR-i ^ function depending only on {xk^N ,Wk-N , ■ ■ ■ ,Wk-i)- 



^Throughout the paper we will use the convention that every superscript indicates a sen- 
sor/subsystem index. 
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Therefore, by eliminating the states in ((T)) using the dynamics (jllip and in- 
troducing the notations: 

r T T T iT 



t=k~N t=k-N 

the MHE problem ([T]) can be recast as an optimization problem with decoupled 
cost but a common decision variable x (DCx): 



(DCx) 



minxEfii/'W 
s.t. : X G X, 



where the set X X x . 

We assume that the communication network among sensors is described by 
a graph G — (V, E), where the nodes in F = {1, • • • , M} represent the sensors 
and the edge {i, j) ^ E C_ V x V models that sensor j sends information to sen- 
sor i. Then, the main challenge is to provide distributed algorithms for solving 
problem ([T]) or equivalently (DCx) which guarantee that all the sensors asymp- 
totically reach a reliable estimate of the state variables using the information 
exchange model given by the graph G. 

Example 2.1 In the particular case where the state and noise constraints x € X 
and w W are described by linear inequalities (i.e. X and W are polyhedral 
sets) and the dynamics of the process and of the sensors are linear, i.e. 

xt+i = Axt + wt, 
yl=G^xt+vl, Vi = l,---,M, 

the MHE problem ([ij can be recast as a separable convex quadratic program 
with decoupled cost but a common decision variable in the form (DCx): 

M 

min ^ x'^i/jX + qfx (2) 

i=l 

S.t. : X e X, 



7 



where the matrices Hi are positive definite and the constraint set X becomes in 
this case polyhedral (described only by linear inequalities). 

2.2. Distributed optimal control problem 

The application that we will discuss in this section is the distributed control 
of large-scale networked systems with interacting subsystem dynamics, which 
can be found in a broad spectrum of applications ranging from traffic networks, 
wind farms, to interconnected chemical plants. Distributed control is promising 
in applications for complex systems, since this framework allows us to design lo- 
cal subsystem-based controllers that take into account the interactions between 
different subsystems and physical constraints. 

We consider discrete-time systems which can be decomposed into M subsys- 
tems described by difference equations of the form: 



where G K"' and € R'"' represent the state and the input of the ith. 
subsystem. The index set A/"' contains the index i and all the indices of the 
subsystems which interact with the subsystem i. We also assume that the input 
and state sequences must satisfy local constraints: 



where the constraint sets X' C R"* and f/' C M™* are usually compact sets. The 
system performance over a prediction horizon of length N is expressed through 
a stage cost and a final cost, which are composed of individual costs for each 
subsystem i and have the form: 



(3) 



xl e X\ ul€U\ Vi = 1, • • • , M, Vt > 0, 



(4) 



N-l 



t=0 
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The centralized optimal control problem over a prediction horizon N reads: 

M N-1 M 



s.t.: xl = x\ xl^,^<j>\xl,ul;j eN'), (01) 

xl e x\ ul e U\ yt,i, (02) 

where a;* are the values of the initial state for subsystem i. Note that a similar 
formulation of distributed control for coupled subsystems with decoupled costs 
has been given in J, |8|, 
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in the context of distributed model predictive 
control. 

Now, we show that the optimization problem ([5]) can be recast as a separable 
optimization problem with a particular structure. To this purpose, we denote 
with X* = (X*)^ X (W)'^ and 

X — [Xl X^ Uq "AT-iJ , 



Af-1 



ri^^)=}_^tixl,ul) + £}{x^^). 



t=o 



With these notations, problem ([SJ now reads as an optimization problem with 
decoupled cost and sparse coupled constraints (DCCC): 



(DCCC) 



s.t. : X* e X\ /i''(xJ; j £ Af') = Vi, 

where the coupled constraints /i*(x-' ; j G A/"') = are obtained from the coupling 
between the subsystems, i.e. by stacking the constraints (|5lip for a given i. 

The centralized optimization problem ([5|) or (DCCC) becomes interesting 
if the computations can be distributed among the subsystems (agents), can be 
done in parallel and the amount of information that the agents must exchange 
is limited. In comparison with the centralized approach, a distributed strategy 
offers a series of advantages: first, the numerical effort is considerably smaller 
since we solve low dimension problems in parallel and secondly such a design 
is modular, i.e. adding or removing subsystems does not require any controller 
redesign. 
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Example 2.2 Many networked systems, e.g. wind farms [20|, interconnected 
chemical processes Hi , or urban traffic systems [sgJ , can be decomposed 
into AI appropriate linear subsystems: 

xj+i = A,xl + B,ul + A^jxl + B.jui, Vi = 1, • • ■ , M, (6) 

where the index set Af^^ = A/"' — {«}, i.e. it contains all the indices of the 
subsystems which interact with the ith subsystem. If we introduce an auxiliary 
variable wl £ to represent the influence of the neighboring subsystems on 
the ith subsystem (in applications we usually have pi << Ui), we can rewrite 
the dynamics © as: 

where the matrices Ei are of appropriate dimensions and 



with the matrices A^j , B^j being obtained from the matrices Aij , Bij by removing 
the rows with all entries equal to zero. We consider a quadratic performance 
index for each subsystem i of the form: 

Af-l 
t=0 

where the matrices Qi,Ri and Pi are positive semidefinite. We also assume 
that the sets X* and C/' that define the state and input constraints @ are 
polyhedral. The centralized control problem over the prediction horizon N for 
this application can be formulated as follows: 

M N-1 

min.EEl|xj||^^+||uj||?,,^ + ||x5v|||>, (7) 

s.t. :a;^=a;\ xl^^^ = A,xl + B,ul + E,wl 01) 

wl^ AT^xi+Br:i4, (02) 

je^f-' 

xl e X\ ul G W \ft,i. (03) 
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We can eliminate the state variables in the optimization problem ([7]) using the 
dynamics (|7lip . In this case we can define x' — [w'^ ■ ■ ■ w^^_^ u^^ ■ ■ ■ u'-^_j]'^ . 
Then, the control problem ([7]) can be recast as a separable convex quadratic 
program with decoupled cost and coupled constraints in the form (DCCC); 

M 

min V x''^iJ,x' + qfx' (8) 

1=1 

M 

s.t. : x'' G X', ^GjX* = g, 

i=\ 

where the matrices iJ' are positive semidefinite, the local constraint sets X* are 
polyhedral and the coupled constraints X]f=i G'iX* ~ g are obtained from the 
coupling between the subsystems, i.e. by stacking the constraints (I7I2I) for all 
i,t. Note that the number of rows of the matrices G* are equal to NJ2iLiPi- 

2. 3. Cooperative control problem of dynamically uncoupled systems 

Cooperative control for dynamically uncoupled systems arises in a wide va- 
riety of applications like formation flying, mobile sensor networks, rendezvous 
problems or decentralized coordination. The cooperative control problem for 
dynamically uncoupled agents consists in controlling a group of independent 
subsystems (i.e. with decoupled dynamics), but sharing a common goal (see 
e.g. QQQ). 

We consider a set of M identical subsystems, having the following state-space 
description: 

x\^,=cj,{x\y,), yl = e{x\), V* = 1,---,M, 

where x\ G R" is the state vector, itj e M™ is the input vector and yl € W is 
the output vector of subsystem i. As in the previous section we assume state 
and input constraints of the form Q . In the formulation of cooperative control 
for uncoupled systems the dynamics of subsystems are independent from each 
other, but they share a common goal. This calls for the minimization of a 
cost function which involves the states and inputs of each subsystem and their 
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neighbors as well. In this case we introduce a stage cost at time t of the form 
i{xl, ■ ■ ■ ,Xf^ ,ul, ■ ■ ■ ,uf'^) and a final cost £f{x]^, ■ ■ ■ ,xf^). 

The cooperative control problem over a finite horizon of length N, given the 
initial condition a;* for each subsystem i, is formulated as follows: 



N-l 



s.t. : 4 = x\ xj+i = (l^ixlul), (9) 
xl G X\ ul e U\ \/i,t. 



Now, let us denote: 



X — [Xl Xj^ Uq "AT-lJ 



/(x"'", • • • , x*^) ~ £{xl, • • • , xf'^, , • • • , wf'^) + £f{x\j, • • • , a;]^), 
t=o 

and X* the constraint set defined by the state and input constraints (|4]) and by 
the ith subsystem dynamics x'l^i = (t>ixl, ul) over the prediction horizon. Using 
these notations, the previous cooperative control problem can be recast as an 
optimization problem with coupled cost and decoupled constraints (CCDC): 



(CCDC) 



minxi,...,x'>f • • • ,x*^) 

s.t. : X* G X\ 



We are interested in finding efficient parallel algorithms for solving problem 
(CCDC). 

Example 2.3 We consider the formation flying for a group of satellites that 
are distributed along a circular orbit with independent dynamics but they have 
to maintain a constant distance with respect to the two nearest neighbors (see 
e.g. Using a discretized version of the linear Clohessy- Wiltshire equations 

of the «th satellite for a nominal circular trajectory [l^: 

x^'^ = 3ujIx^'' + 2a;„i2'* + o^-* 
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where x^''^, x'^'* are the displacements in the radial, tangential and out-of- 
plane direction, a^'\ a^'', a^'* represent the accelerations of the satellite i due 
to propulsion or external disturbances and w„ is the angular velocity at which 
the orbit is covered, we obtain a discrete-time linear system for the ith satellite 
of the form 

' xj+i = Axl + Bui 
yl = Cxi 



with xl e and ul 



r l.i 2,1 3,iiT 

a/ a.' a.' 



being the state, respectively the 



input vectors of satellite i and we consider as output yl 



r l,i 2,i 3,2]T 
\X^ X ^ X- 



, the 



vector of absolute positions of the satellite. We also assume input constraints 
of the form: 

^min ^ ^ ''^max Vz,t. 

Since the goal is to maintain a constant distance with respect to the two nearest 
neighbors, we choose the following stage cost at time t: 



i{Xf , • • • , 



M 

El 

4 = 1 



Vt 



where Qi,Ri are positive definite matrices. We assume the final cost £f = 0. 
Despite the fact that the output represents the absolute positions of the 
ith satellite, using the stage cost from above, the formation flying becomes a 
problem based on relative positions between the satellites instead of the absolute 
ones. In this case the cooperative control problem ^ over a finite horizon N 
can be recast as a convex quadratic problem with coupled cost and decoupled 
constraints in the form (CCDC) : 

T 



(10) 



s.t. : x' G X\ 

where the blocks of the positive semidefinite Hessian matrix H = [Hij\ij satisfies 
Hij = if |i — j I > 3 for all i, j and the sets X' are polyhedral. 
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Remark 2.4 (i) Note that we can eliminate the states x\, - ■ ■ ,x''jq using the 
dynamics of the «th satellite and keeping only the inputs over the prediction 
horizon as decision variables, i.e. we may redefine = [ u^f ■ ■ ■ In this 

case H becomes positive definite and the sets X* are described only by linear 
inequalities. 

(ii) In many applications we can move the coupling terms from the cost to the 
constraints by introducing auxiliary variables, i.e we can recast an optimization 
problem with coupled cost but decoupled constraints (CCDC) to one with 
decoupled cost but coupled constraints (DCCC). E.g., in our satellite formation 
application we can define the coupling constraints — y\^^ + y\'^^ and then 
we can associate a local stage cost for each satellite i as wl, u\) ~ \ \2Cx\ — 
^tllQi + ll"tlllsi but with coupled dynamics uij = Ci^xl'"^ +x\'^^). We can also do 
the other way around: we can reformulate a (DCCC) into a (CCDC) problem 
(e.g. by moving the coupling constraints (|5I1|) into the cost, see Section [5^. 
Depending on applications one formulation might be preferred against the other 
(see also Section below). 

2.4. Cooperative control problem of dynamically coupled systems 

In this section we discuss the cooperation-based optimal control problem for 
a group of dynamically coupled subsystems 0, [2l[ 39, 42, [4^ [s^. For the ith 
subsystem we consider the following linear dynamics: 

x\^^^ A,x\+B,v^^+ B^,ul Vz = l,---,M. (11) 

Note that the dynamics described in pT|) are a particular case of ([5]). We also 
assume local input constraints G ?7*, where C/* are convex sets. 

For each subsystem we define a local stage cost ^*(x% u*) and a terminal cost 
iy{x^). The local cost for each subsystem on a finite horizon of length N will 
be of the following form: 

N~l 

f (x\r) = (12) 

i=0 
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where we denote with 



■ X 



N \ 



= <:,r. (13) 



In order to provide a cooperative behavior between subsystems we replace 
each local cost with one that represents the systemwide impact of local con- 
trol actions. One choice is to employ a strong convex combination of local 
subsystems' costs as the global objective function for the entire system. In 
these conditions, the cooperative control problem for coupled systems on a fi- 
nite horizon N will have the form: 

M 

minVa,r(x^uO (14) 
"''^ ^=l 

s.t.: xi_^_J^ ^ A,xl + Biul + ^ Bijui, Xq = x\ IHl) 

ul e W Vt,i, 1112) 



where > and sum to 1 . Note that in this form problem (|14p is a particular 
case of problem (DCCC), where the variables associated to the ith subsystem 
are given However, by eliminating the states in (jl4[) using the 

global dynamic model ([T4l l) we obtain a coupled objective function in the local 
variables x* — u* (i.e. in the local control actions) and decoupled constraints, 
which is a particular case of (CCDC) problem (see also Remark l2.4f ii)). 

3. Parallel and distributed optimization algorithms for solving cou- 
pled optimization problems 

In this section we present several parallel and distributed algorithms for 
solving the optimization problems arising in applications from estimation and 
control discussed in Section [2] and analyze their properties and performances, 
in particular we define conditions for which these algorithms convergj^. The 



■^For simplicity of the exposition, in this section we assume that all the functions are 
differentiable. 
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presented algorithms can be classified, on the one hand in "centralized" al- 
gorithms (that in general take advantage of the sparsity of the problem and 
solve in parallel low dimension subproblems) and distributed algorithms (that 
take into account explicitly information restrictions in the network and combine 
consensus negotiations with optimization methods to solve distributively the 
problem) and on the other hand in primal and dual decomposition algorithms. 
The first class is based on decomposing the original optimization problem, while 
the second consists in decomposing the corresponding dual problem. 

For a given problem representation there are often many choices of dis- 
tributed algorithms, each with possible different characteristics: e.g. rate of 
convergence, tradeoff between local computation and global communication, 
and quantity of message passing. Which alternative is the best depends on the 
specifications of the application. However, for each algorithm we will discuss in 
details their main characteristics in terms of performance and properties. 



3.1. Distributed gradient algorithms for optimization problems of type (DCx) 

In this section we study several distributed algorithms for solving separable 
optimization problems with decoupled cost and common decision variables in 
the form (DCx), that e.g. appear in the context of state estimation in sensor 
networks (see Section [2.11) . We associate to the set of agents (e.g. sensors) 
a graph G = (V, E) and then such distributed algorithms must satisfy the 
following constraint: the computations will be performed on all nodes in parallel, 
and the communication between nodes is restricted to the edges of the graph. 
Distributed optimization algorithms are mainly based on combining consensus 
negotiations (as an efficient method for information fusion) with optimization 
methods [3, H, Isil, H] to solve distributively problems of type (DCx). 

First we introduce the consensus problem for a group of M agents that 
considers conditions under which using a certain message-passing protocol, the 
local variables of each agent will converge to the same value [3, S, H] ■ There 
exist several results related to the convergence of local variables to a common 
value using various information exchange protocols among agents [sS, 37 
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One of the most used models for consensus is based on the following discrete- 
time iteration: to generate an estimate at iteration fc + 1, agent i forms a convex 
combination of its estimate x^ with the estimates received from other agents: 



M 



where 7^"* represent nonnegative weightf|f| satisfying 7^"' =1. At each iter- 
ation k the information exchange among agents can be represented by a graph 
{V, Ek), where Ek ~ {(«, j) : 7^"* > 0}. We can also introduce the graph {V, Eoo), 
where i?oo — {(hj) '■ (hi) £ -^fc for infinitely many k}. The graphs {V,Ek) sat- 
isfy the bounded interconnection interval property if there exists an integer r 
such that for any {i,j) £ -Boo agent j sends its information to agent i at least 
once every r consecutive iterations. It has been proved in [sill that under certain 
assumptions on the weights 7^;^ (e.g. stochasticity of the matrix Tk = [7fc']ij, 
strong connectivity property of {V, Eoo) and bounded interconnection interval 
property), the states x\ of all agents converge to the same state x* . Similar 



convergence results can be found in 



24 



5l| 



We return now to our optimization problem of type (DCx). In 3^ a dis- 
tributed projected gradient algorithm is analyzed, which basically combines the 
consensus iteration presented above with a projected gradient update to gener- 
ate the next estimate of the optimum. More specifically, an agent i updates its 
estimate by combining the estimates received from its neighbors, then taking a 
gradient step to minimize its objective function and finally projecting on the 
set X: 

Algorithm dgpl 

M 



•^Naturally, an agent i assigns zero weight to the estimates for those agents j whose 
estimate information is not available at the update time. 
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where ak is a common step size, V/' denotes the gradient of the function 
and [-Jx denotes the Euchdian projection on the set X. The following 
convergence result holds for Algorithm dgpl : 



Theorem 3.1. fSa l For the optimization problem (DCx) we assume that all 
the functions /' are convex and have hounded gradients, the set X is convex and 
the step size satisfies Uk — oo and a| < oo. Moreover, we assume that 
the weights 7^'' satisfy the following properties: the matrices = ["f^lij oltc 
doubly stochastic, the graph (V,i?oo) is connected and the bounded interconnec- 
tion interval property holds. Then, the distributed projected gradient Algorithm 
dgpl converges to an optimum of problem (DCx). 

An interesting variant of a distributed gradient projected algorithm has been 
provided in [l^ . Compared to the previous distributed gradient Algorithm 
dgpl, in fl^ a fixed connected graph (V, E) is taken over all iterations and the 
information exchange among the agents is represented by a doubly stochastic 
matrix V = [7*^]jj such that ^/ij > if (i, j) € E. In this algorithm, first each 
agent implements the gradient update locally and then it runs a number /i of 
consensus iterations with its neighbors: 

Algorithm dgp2 

M 

Err,(4-«fcV/^(xi) 



X 



fc+1 



where denotes the («, j) entry of the matrix F^. Under similar assumptions 
as in Theorem 13.11 the authors in \l\ proved convergence of Algorithm dgp2 
for a constant step size and for a sufficiently large ^. 

In the case when the set X is explicitly defined through a finite set of equali- 
ties and inequalities, an algorithm based on a penalty primal-dual approach has 

r~i 

been recently proposed in 52]. This algorithm allows the agents exchange in- 
formation over networks with time- varying topologies and asymptotically agree 
on an optimal solution and the optimal value. 



18 



Another interesting approach for solving the optimization problem (DCx 



30[ where an incremental gradient 



but in a serial fashion, can be found in 
method is presented. Each step of the algorithm is a gradient iteration for a 
single component function and there is one step per component function. 
Thus, an iteration can be viewed as a cycle of M subiterations, so that at fc + 1: 

Xfc+i — ZM,k, ZQ,k — Xfe, 

Zi,k = [zi~i,k - afc V/'(zi_i,fc)] ^ Vi = 1, • • • , M. 

For convex problems, using an appropriate step size a^, the authors in 
show that this algorithm has much better practical rate of convergence than the 
classical gradient method. 

Remark 3.2 (i) The convexity assumptions on the functions /* and the set 
X for convergence of the two Algorithms dgpl and dgp2 are usually satisfied 
in many applications: see e.g. the state estimation problem for linear systems 
discussed in Example 12.11 which leads to the convex quadratic program ^ . 
(ii) One of the main challenges when solving problems of type (DCx) is the 
time-dependent communication topology, as communication links can change 
due to changing distances, obstacles, or disturbances. While in [l^ a constant 
topology is assumed for Algorithm dgp2, the Algorithm dgpl and the algorithm 



from [52| are based on a changing topology, which makes them more suitable 
in practical applications. Moreover, the cyclical incremental algorithm 13(3] can 
be implemented only when each agent identifies a suitable downstream and 
upstream neighbor. Note the existence of a cycle is a stronger assumption than 
connectivity. 



(iii) From simulations we have observed that the algorithms from [l^ . 0, [52 1 
are very sensitive to the choice of the weights that must be tuned, since they are 
considered as parameters in these methods. These algorithms do not provide a 
mathematical way of choosing the weights from the consensus protocol, which 
has a very strong infiuence on the convergence rate of these methods. Recently 
in 26|, a distributed algorithm has been derived for solving particular cases of 
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problems of type (DCx), where the nonnegative weights correspondmg to the 
consensus process are interpreted as dual variables and thus they are updated 
using arguments from duality theory. Moreover, if the network is not densely 
connected (i.e. each sensor has a large number of neighbors), one can expect 
the performance of these algorithms from [l^ 
the cyclic incremental gradient [30]. 



52| to be worse than that of 



M 


N 


nr. it. dgpl 


nr. it. dgp2 


10 


10 


5.627 


586 


10 


20 


8.447 


746 


20 


10 


10.651 


1.854 


20 


20 


14.758 


2.571 



Table 1: State estimation problem Example 12.11 we consider M = 10,20 sensors, a linear 
system with 5 states and a prediction horizon A'^ = 10, 20. We solve the convex quadratic 
program l(2]l with the accuracy of the solution e = 10~^. We assume fixed weights in both 
algorithms such that 7'-' = for |* — j| > 1 and /i = 10. From simulations we observe 
that Algorithm dgp2 works better than Algorithm dgpl in terms of the number of gradient 
iterations. However, Algorithm dgp2 needs to perform for each gradient iteration also fi = 10 
consensus steps. 



3.2. Decomposition algorithms for solving optimization problems (DCCC) 

In this section we present several decomposition algorithms for solving sepa- 
rable optimization problems with decoupled cost but coupled constraints in the 
form (DCCC). Distributed control for complex processes with interacting sub- 
system dynamics usually leads to such optimization problems (see e.g. Section 
12. 2|) . We discuss two classes of decomposition principles: primal and dual. We 
use the terms primal and dual in their mathematical programming meaning: 
primal indicates that the optimization problems are solved using the original 
formulation and variables and dual indicates that the original problem has been 
rewritten using Lagrangian relaxation. 
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Compared to the general formulation of problem (DCCC), we focus in this 
section on decomposition methods for the particular case of separable convex 
problems with decoupled cost and coupled constraintj" 



(conv-DCCC): 



s.t. : X* e X\ Efii G.x' = g, 



where we consider that for all i the coupled constraints /i'(x-' ; j G W') = in 
problem (DCCC) become linear and can be written compactly as X]f=i G^x' = 
g, with Gi G M"^ . For simplicity of the exposition the following assumptions 
hold for problem (conv-DCCC) (for general case of convex problems see 28, 

Assumption 3.3. Each function /* is convex quadratic and are compact 
convex sets. Moreover, the Slater's condition holds, i.e. there exist of G int{JC) 
such that J2iLi GiX^ = g. 



From Example l2.2l we have seen that centralized optimal control for intercon- 
nected linear systems leads to such a separable convex quadratic formulation, 

e.g. ©• nnnn 

We begin with primal decomposition (see e.g. [S, [Zl, |38|, |43[ and the references 
therein). We can decompose the original problem (conv-DCCC) as follows: 
we introduce some auxiliary variables in order to separate the coupled linear 
equality constraints, i.e. we introduce the new variables t^,-- - ,t*^~^, and 
obtain M subproblems: 

(r) : V*(t') = min{/*(x*) : x' G X\ G^x* = t'} 

for i = 1, . . • , M — 1 and the Mth subproblem 

M-l 

(P^^: ^'^(t\-.-,t^^-i)=min{/^^(x^O: x^^GX^, ^ I^ + Gmx^'^ 



*For the nonconvex case of problem (DCCC) we can still obtain decomposition algorithms 
by combining sequential quadratic programming or sequential convex programming, in order 
to linearize the nonlinear coupled constraints, with decomposition methods that address the 
decomposable convex problems (see e.g. [2^). 
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The separable convex problem (conv-DCCC) reduces to solving the uncon 



stramed convex primal problem (PP) 43[: 



(PP) : min_^ V(t\-- - ,t^^~^), 

where i^{t\ ■ ■ ■ ,t^'~^) = ^^t^) + ■■■ + ^ij/'-\t^'-^) + ^^Ut\ ■ ■ ■ ,t''-^). 
Conditions for well-posedness of the primal problem (PP) can be found in 



43|. Let x*(t') and A*(t') be the optimal solution and the corresponding 
optimal Lagrange multiplier for the equality constraints G^x* — V, respec- 
tively, for subproblem P* given t' , with i = 1, ■ • • , Af — 1. Similarly, we define 
x*^(t\--- ,t*^"i) and A^^(t\--- ,t^^-i) for subproblem P^^. Although the 
function ip is potentially non smooth, assuming that Slater's condition for the 
convex problem (conv-DCCC) holds (according to Assum p tion I3.3p . the fol- 
lowing vector is a subgradienjf] of V' at (t^, • • • , t*^^^) [l|, [4^: 

Algorithm primal subgradient (PS) 

x^-x^(t^), Xl^^ii^) for* = l,-.- ,M-1 

— ^ l^fcj J^k ^k ^ \^k^ ^^k 



where is a step size. 

Remark 3.4 The step size at can be chosen in two ways: (i) it can vary but 
satisfying = 00 and < cxd; (ii) ak is constant for all k. 

Under Assumption 13.31 the convergence of this primal subgradient algorithm 
is obvious, due to the equivalence between the (conv-DCCC) problem and 

vector s £ M" is a subgradient of / : M" — R at a point x G dom/ if for all y G dom/ 
we have f{y) > f(x) + s'^(y - x). 
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the convex primal problem (PP). When the primal problem (PP) (called also 
the master problem) is solved using this scheme, the method has an interesting 
economic interpretation: at each iteration the master program allocates the 
resources (by choosing t^) and the nodes return the prices associated with this 
choice A^. The iteration continues until the prices have reached the equilibrium. 



;ne prices nav( 

BflQH 
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48|. In dual decom- 



We now discuss dual decomposition 
position methods we have the following economic interpretation: the master 
problem sets the prices for the resources to each subproblem which has to de- 
cide the amount of resources to be used depending on the price. The iteration 
continues until the best pricing strategy is obtained. Clearly, if the coupled 
constraints J^i GiX^ — g are absent, then the problem (conv-DCCC) can be 
decoupled. Therefore it makes sense to relax these coupled constraints using 
duality theory. We construct the partial augmented Lagrangian: 

M M 

i^(x. A) = ^ /'(x^) + /iPx' (x') + A^(^ G,;x' - g), (15) 

4=1 4=1 

where /i > and the functions Py^i associated to the sets X' (usually called 
prox functions) must have certain properties explained below. We also define 
the corresponding augmented dual function: 

d^{X) ^ min i:^(x. A), (16) 

and from the structure of we obtain that decouples in M subproblems 

x'(^. A) = arg min /*(x') + fiP^.{x') + \^G,x\ 

We are interested in the properties of the family of augmented dual functions 
{'^A'}/'>o- Note that lim^^o '^^^(A) = (io(A), where do(A) = min^.^x- -^o(x. A) 
is the standard dual function, whenever the prox functions Py^i are chosen to 
be continuous on the compact sets X* or are barrier functions associated to 



these sets (see [33|). The goal is to maximize the augmented dual function for 



jj, sufficiently small: 



max(i^,(A), 
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in order to find an approximation of the optimal Lagrange multiplier A* = 
argmaxA do(X) and then to recover an approximation of the corresponding op- 
timal primal variables x**. We distinguish three algorithms, depending on the 
choice of the constant fi and of the prox functions Px» • 

(I) dual subgradient algorithm: n — and Px* — 

(II) dual fast gradient algorithm: ^ > and Px* &re strongly convex 
functions 

(III) dual interior-point algorithm: /i > and P^i are barrier functions for 
the sets X*. 

The next theorem provides the main properties of the augmented dual function: 
Theorem 3.5. 



281. \29i] Under Assumvtion \3.3l the augmented dual function 
df^ is characterized as follows: 

(I) For any /i > and convex functions Px* a subgradient of df^ at A is given by 
GiX^{li, A) — g. (II) For /i > and strong convex functions Pjj-i the function 
df^ has a Lipschitz continuous gradient. (Ill) For /i > and barrier functions 
Pxi the function d^ is self-concordant. 

We denote = x''{fik, Afe). The iterations of the three algorithms are: 
Algorithm dual subgradient (DS) 

M 



Afc+i = Afe + ctkC^ GiX.1 - g) 



i=l 

Algorithm dual fast gradient (DFG) 

M 

-( 



1 i 

Afe+i = Afc + - — (^ Gix). — g), Afc+i — \k+i + /3fe(Afc+i — A^ 



Algorithm dual interior-point (DIP) 

72 



Afc+1 = Afe + Qffe (V^(i^p(Afe)) Vd^p(Afc) as /ip -> 0, 
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where ak is a step-size that can be chosen as in Remark 13.41 for algorithm (DS) 
or satisfying Armijo rule [s^ for algorithm (DIP), is the Lipschitz constant 
of the gradient Vd^ and > is defined iteratively as in 'X.\ . Moreover, in the 
dual interior-point algorithm (DIP) we have an outer iteration in p where we 
decrease /Zp — and an inner iteration in k where we need to generate vectors 
close to the central path using Newton updates with V^d^{X) representing the 
Hessian of the augmented dual function at A (see [29| for more details). 

The convergence of these three algorithms (DS), (DFG) and (DIP) can be 
established under suitable assumptions on problem (conv-DCCC) and on the 
prox functions P-^i : 



Theorem 3.6. 



2a . \2m] If A ssumption 1 3. S\ holds for the separable convex prob- 



lem (conv-DCCC), then all three algorithms (DS), (DFG) and (DIP) are 

convergent under a suitable choice of the step-size. Moreover, the dual fast 
gradient algorithm (DFG) has complexity 0{^), while the dual interior-point 
algorithm (DIP) has complexity O (c2log(^)), where e is the accuracy of the 
approximation of the optimum for problem (conv-DCCC) and Ci are some 
positive constants. 

We should note that in the primal subgradient algorithm we maintain feasi- 
bility of the coupled constraints in the problem (conv-DCCC) at each iteration 
while for the dual algorithms feasibility holds only at convergence of these al- 
gorithms and not at the intermediate iterations. Since for control problems the 
coupled constraints represent the dynamics of the networked system over the 
prediction horizon, when using a dual algorithm these dynamics will be satisfied 
only at convergence. This is a major issue when we stop at an intermediate step 
of a dual based algorithm. 

There are also other dual decomposition methods based on the concept of 
augmented Lagrangians: e.g. the alternating direction method 
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48| , where a 



quadratic penalty term G^x* — .9||^ is added to the standard Lagrangian 

Lq. a computational drawback of this scheme is that the quadratic penalty 
term is not separable in x'. However, this is overcome by carrying out the 
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minimization problem in a Gauss-Seidel fashion, followed by a steepest ascent 
update of the m ultip liers. In other dual decomposition methods, such as partial 
inverse method [3| or proximal point method for example a term of the 
form /i J2i 1 1^* ^ ^1- 1 P is added to the Lagrangian Lo. These schemes have been 
shown to be very sensitive to the value of the parameter fi, with difficulties in 
practice to obtain the best convergence rate. Some heuristics for choosing fj, can 



be found in the literature 



m 



However, these heuristics have not been 
formally analyzed from the viewpoint of efficiency estimates for the general case 
(linear convergence results have been obtained e.g. only for strongly convex 
functions) . 

The new decomposition methods called here "dual fast gradient" (DFG) 
and "dual interior-point" (DIP) obtained by smoothing the Lagrangian are 
more efficient in terms of number of iterations compared to the classical primal 
or dual subgradient algorithm (see also Table 2). We should note however, that 
algorithm (DFG) is more appropriate than the algorithm (DIP) when solving 
problems where the number of coupling constraints is large, since for (DIP) 
we need to invert at each iteration a square matrix of dimension n\, where nx 
denotes the dimension of A (or equivalently the number of rows in the matrices 
Gi). 

It is also clear that the update rules in algorithms (DS) and (DFG) are 
completely distributed, according to the communication graph between subsys- 
tems. Indeed, we recall that the coupling constraints /i*(x^; j e TV*) = in 
problem (conv-DCCC) are assumed to be linear, of type G'^[x.^]j^j^i — gi, i.e. 
we have [Gi ■ ■ ■ Gm] — [G^^ ■ ■ ■ G'^'^'^Y' . Let A* be the Lagrange multipliers for 
the constraints G*[x^]jgjv'i ~ gi, and thus A = [A"'^"^ • • • A*^"^]"^. Then, the main 
update rules in Algorithms (DS) and (DFG) are distributed, each agent i using 
information only from its neighbors, e.g.: 



However, for the algorithm (DIP), the update of the Lagrange multiplier has 
to be done by a central agent, i.e. in this case we have a star-shaped topology 
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M 


N 


nr. it. (DS) 


nr. it. (DFG) 


nr. it. (DIP) 


10 


10 


5.000(0.19) 


1.215(10-2) 


78(10-4) 


10 


20 


5.000(0.47) 


1.873(10-2) 


117(10-4) 


10 


30 


5.000(0.81) 


2.721(10-2) 


165(10-4) 



Table 2: Distributed control problem for a network of intereonnected linear subsystems, Ex- 
ample [2]2] where rii = 5,mi = 3 and pi = 2 for all i: we consider M = 10 subsystems 
and a prediction horizon A' = 10, 20 and 30. The weighted matrices are taken Qi = /s 
and Ri = l2- By eliminating the states we obtain the convex quadratic program with 
= . ■ where each matrix Hi G M^l™. x '^Cm.+P.) ig posi- 

tive semidefinite. In the brackets we display the accuracy t. Clearly, the dual algorithms based 
on smoothing techniques (DFG) and (DIP) work much better than classical dual subgradient 
algorithm (DS). 

for the communication among subsystems. Note that for this algorithm the 
sparsity of the graph wih impose sparsity on the matrices Gi, which in turn 
win have a strong effect on the computation of the Hessian of the corresponding 
dual function (see [29] for more details). 

3.3. Parallel algorithms for solving optimization problems of type (CCDC) 

In this section we study parallel algorithms for solving optimization prob- 
lems with coupled cost but decoupled constraints in the form (CCDC), that 
e.g. appear in the context of cooperative control (see Sections 12.31 and 12. 4|) . 
A well known parallel algorithm in linear algebra for solving systems of lin- 
ear equations is the Jacobi algorithm that can be also used in the context of 
optimization 2]. Applying Jacobi algorithm, we decompose our optimization 
problem of type (CCDC) into M optimization subproblems of lower dimension. 
In this algorithm each agent updates its variable x* by solving a low dimension 
optimization problem where the values of the rest of variables are calculated at 
the previous iteration. An extension of the Jacobi algorithm is the Gauss-Seidel 
algorithm, where at each iteration each agent updates its variable by solving 
an optimization problem for which the rest of the variables are replaced with 
the most recent values computed. It is clear that in the Jacobi algorithm the 
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Algorithm Jacobi 



■k+l 



arg min f{x 
x*ex' 



.1 



• • • x*^^ X* X*' 

,Xj, ,X 



i+1 




Algorithm Gauss-Seidel 



■fc+i — 



arg min f{x 



,1 

■fe+ii 



optimization subproblems can be solved in parallel at each iteration. The Gauss- 
Seidel algorithm can be also parallelized, providing that a coloring scheme can 
be applied (see Q for more details). 

The convergence of these two algorithms can be established under suitable 
contraction assumptions on the mapping x — /?A/(x) with respect to the block- 
maximum norm ||x|j = max^ ||x'||/Ci , where the i^^'s are positive scalars and 
x= [xi^---x*^^]^. 

Theorem 3.7. f^J. For the optimization problem ( CCDC) we assume that the 
objective function f is differentiable and suppose that the mapping x— /3Af{x) 
is a contraction for some positive scalar 13. Then, the Jacobi and Gauss-Seidel 
algorithms are well defined and the sequence {x^}]^ converges to the minimum 
of ( CCDC) linearly for both iterations. 

For the Gauss-Seidel algorithm, the assumptions for convergence given in 
Theorem 13.71 can be relaxed, in particular the contraction assumption can be 
replaced with a convexity assumption on the objective function (/ needs to be 
differentiable and convex and, furthermore, the function / needs to be strictly 
convex function of x* when the values of all the other components of x are 
held constant, for each i), see Q for more details. If / is not differentiable, 
the Jacobi or Gauss-Seidel algorithm can fail to converge to the minimum of 
(CCDC) because it can stop at a non-optimal "corner" point at which / is 
non-differentiable and from which / cannot be reduced along any coordinate. 
The contraction assumption on the functions / for convergence of these two 
algorithms is usually satisfied in many applications: see e.g. the cooperative 
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control problem for satellite formation discussed in Example 12.31 which leads to 
the convex quadratic program (|10[) for which the Hessian satisfies the contraction 
assum ptio n or the application from Section 12.41 

In 3JI the optimization problem (CCDC) has been solved using a coordi- 
nate descent method. The iteration /c + 1 of the algorithm has the following 
form: 

x^Vi = arg min, V.,/(x,f (x^-- - x^f^) + :^||x*'= - x^'" f , 

where is chosen randomly based on a uniform distribution. Moreover, we 
assume componentwise Lipschitz continuity of the gradient of / with the Lip- 
schitz constant Li, for all i = 1, • • • , M. In [s^l Nesterov proves 0{^) rate of 
convergence in probability for the coordinate descent algorithm. 

For cooperative control problems of dynamically coupled systems (see Sec- 
tion 12. 4p . which also leads to optimization problems of the form (CCDC), 
various versions of Jacobi-based algorithms have been proposed in the litera- 
ture. For example in 



42l Q, the authors have proposed an algorithm of the 



following form: 



5ffc = arg min./(x^,--- , x^ \x\x^+\-- - ,xf), 

x»6X» 



the authors have 



where ai are positive weights, summing to 1. In 
shown that all the limit points of the sequence generated by the previous algo- 
rithm are optimal. 

In Q the authors have proposed a decomposition of the problem (CCDC) 
into a set of local subproblems that are solved iteratively by a network of agents. 
Each subproblem is obtained from (CCDC) discarding from the objective / the 
terms that do not depend on x* and with the constraint set X'. A distributed 
algorithm based on the method of feasible directions has been proposed to gen- 
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M 


N 


a 


nr. it. Jacobi 


nr. it. Gauss-Seidel 


10 


40 


0.1 


12.435 


3.834 


10 


40 


1 


1.413 


365 


10 


40 


10 


174 


68 



Table 3: Cooperative control problem for satellite formation Examplc l2.3l we consider M = 10 
satellites and a prediction horizon = 40. The weighted matrices are taken Qi = I-j, and 
Ri = <tI3 and the accuracy of the solution e = lO""'. By eliminating the states we obtain the 
convex quadratic program l|10p with = [ u'q" ■ ■ and a strongly convex objective 

function having the convexity parameter a. Clearly, for large cr both algorithms work better. 



erate the iterations of the agents: 

where the local descent direction is = — x^, for x^ e X', and the step 



size a], satisfies the Armijo rule [33]. The local iterations require relatively low 
effort and arrive at a solution of (CCDC) at the expense of slower convergence 
and high communication among neighboring agents. 

From the Tables 1, 2 and 3 we can observe that, in order to get an opti- 
mal solution, we need to perform a large number of iterations. Note however 
that in practical applications from control it is not always necessary to get an 
optimal solution, but we can also use a suboptimal solution that can still pre- 
serve some fundamental properties for the system such as robustness, stability, 
etc. Whenever a suboptimal solution is satisfactory we can stop the optimiza- 
tion algorithm at an intermediate iteration. Note that there exist many control 
strategies based on this principle of suboptimality (see e.g. 



39 



45 



49|). 



4. Conclusions 

This paper has presented three applications from estimation and process 
control for networked systems that lead to coupled optimization problems with 
particular structure that can be exploited in decomposition algorithms. A sys- 
tematic framework is then developed in the paper to explore several parallel and 
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distributed algorithms for solving such structured optimization problems, each 
with a different tradeoff among convergence speed, message passing amount, 
and distributed computation architecture. For each application, numerical ex- 
periments on several parallel and distributed algorithms are provided. 
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